Synopsis: Multi-column Attributes
Let’s reassemble how we should store multi-column attributes.
We'll cover the following
We may all have the experience of using or creating tables to store people’s contact information. Such tables often have common columns, such as the contact’s name, salutation, address, and sometimes company name.
Phone numbers are a little trickier. People use multiple numbers: a home number, a work number, a fax number, and a mobile number. In the contact information table, it’s easy to store these in four separate columns.
But what about additional numbers? The person’s assistant, second mobile phone, or field office have distinct phone numbers, and there could be other unforeseen categories. We could create more columns for the less common cases, but that really compromises the table’s outlook because it adds seldom-used fields to data entry forms. Indeed, how many columns are enough columns?
Objective: Store multi-value attributes#
This is the same objective as in the chapter Jaywalking: an attribute seems to belong in one table, but the attribute has multiple values. Previously, we saw that combining multiple values into a comma-separated string makes it hard to validate the values, read or change individual values, and compute aggregate expressions such as counting the number of distinct values.
We’ll use a new example to illustrate this antipattern. We want the bugs database to allow tags so we can categorize bugs. Some bugs may be categorized by the software subsystem that they affect — for instance, “printing”, “reports”, or “email”. The nature of the defect may categorize other bugs; for instance, a crash bug could be tagged “crash”, while we could tag a report of slowness with “performance”, and we could tag a bad color choice in the user interface with “cosmetic”.
The bug-tagging feature must support multiple tags because tags are not necessarily mutually exclusive. A defect could affect various systems or could affect the performance of printing.
Legitimate uses of the antipattern#
In some cases, an attribute may have a fixed number of choices and the position or order of these choices may be significant. For example, a given bug may be associated with several users’ accounts, but the nature of each association is unique. A user may be the one who reported the bug, while another is the programmer assigned to fixing the bug, and a third may be the quality control engineer assigned to verify the fix. Even though the values in each of these columns are compatible, their significance and usage actually make them logically different attributes.
It would be valid to define three ordinary columns in the Bugs
table to store each of these three attributes. The drawbacks described in this chapter may not be as important because we are likely to use the separate columns separately. Sometimes, however, we may still need to query over all three columns, for instance to report everyone involved with a given bug. But we can accept this complexity for a few cases in exchange for greater simplicity in most other cases.
Another way to structure this is to create a dependent table for multiple associations from the Bugs
table to the Accounts
table and give this new table an extra column to note each user account’s role concerning that bug. However, this structure may lead to some of the problems described in the chapter Entity-Attribute-Value.